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1.  Introduction  and  Background 


As  computational  biology  continues  to  play  a  larger  role  in  applied  researeh,  eollaborations 
between  simulation  and  experiment  ean  benefit  from  software  that  provides  access  to  various 
styles  of  methods,  enabling  eustomization  and  rapid  applieation  to  a  wide  range  of  biologieal 
systems.  A  key  work  by  Eleoek,  et  al.  sets  the  stage  in  eomputational  biology  for  this  style  of 
adaptable  researeh,  early  on  in  the  last  deeade  (5).  In  their  investigation  of  the  driving  forees  in 
protein-protein  interaetions,  the  authors  draw  upon  several  computational  methods  to  eharaeterize 
interaetion  energy  and  orientation,  and  their  effeets  on  maeroseopie  quantities  like  the  seeond- 
order  virial  eoefficient.  Years  later,  researchers  have  at  their  disposal  many  software  options  for 
taekling  the  handful  of  diverse  systems  involving  biomolecular  interaetions.  However,  features 
that  make  software  in  eomputational  biology  aeeessible,  usable,  and  powerful  remain  eharaeteristie 
of  a  seleet  few  {4-9).  Therefore,  the  ehallenge  presented  to  this  decade’s  researeher  is  how  to 
effieiently  use  these  software  paekages  in  eoneert  to  study  aspeets  of  a  biologieal  system. 

We  set  our  focus  on  open-source  simulation  paekages,  and  unite  these  powerful,  sometimes 
disparate  modes  of  simulation  with  a  simple,  extensible,  and  objeet-oriented  Python  suite  of  eode 
called  APa/r/t.  Joining  sueh  programs  provides  enhaneed  simulation  functionality  through  a 
concerted  use  of  each  program’s  multiple  length  seales,  interatomic  potentials,  and  simulation 
methods.  This  is  aeeomplished  with  the  parallelized  eore  of  XPairIt,  whieh  additionally  eontains 
many  data  management  and  organization  options,  analysis  tools,  and  custom  simulation 
methodology.  Two  open-souree  simulation  APIs  offer  options  similar  to  XPairIt.  The  first  is  the 
Molecular  Modeling  Toolkit  (MMTK)  {10).  MMTK,  initially  developed  about  a  deeade  ago,  is 
written  in  Python  using  object-oriented  design,  and  is  a  more  stand-alone  piece  of  software  when 
eompared  to  XPairIt.  It  incorporates  internal  molecular  dynamics  and  Monte  Carlo  algorithms, 
several  interatomie  potentials,  and  normal  mode  analysis  into  a  single  software  paekage  for  the 
simulation  of  protein  systems.  The  second  is  SimTK’s  OpenMM,  which  is  another  general 
simulation  toolkit  using  graphies  proeessing  unit  (GPU)  hardware  aeeeleration  {II).  These 
toolkits  differ  from  XPairIt  in  that  mueh  of  the  simulation  in  MMTK  and  OpenMM  is  done 
within  the  eore  of  the  toolkit,  where  in  XPairIt  the  simulation  engines  are  typieally  external 
software  paekages.  In  this  application  of  the  XPairIt  API,  we  ereate  a  freeware,  extensible,  high- 
performanee  eomputing  (HPC)-ready,  multi-seale  biomoleeular  simulator  built  to  inelude  several 
industry  proven  external  software  paekages  to  taekle  many  of  the  eurrent  ehallenges  in  modeling 
and  analyzing  peptide-protein  complexes. 

The  main  simulation  field  specializing  in  the  investigation  of  peptide-protein  interaetions  is 
known  as  “doeking.”  That  is,  the  effieient  simulation  of  the  interaetion  of  two  biomoleeules  to 
determine  their  natural  and  preferred  orientation  when  in  eontaet  with  one  another.  Some  of  the 
multi-method  functionality  for  molecular  docking  offered  by  XPairIt  is  currently  available  in  a 
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few  commercial  software  suites,  such  as  Molecular  Operating  Environment  (MOE)  or 
Shrodinger  through  its  PIPER  and  JAGUAR  modules  (5,  9).  While  these  software  packages  are 
largely  successful  in  modeling  many  types  of  biomolecular  interactions,  there  are  several 
shortcomings  which  become  apparent  when  applied  to  the  docking  of  certain  molecules,  such  as 
peptides  {12, 13). 

Recent  software  innovations  have  attempted  to  address  the  challenges  of  incorporating  peptide 
flexibility  in  docking  simulations  and  adequately  ranking  (“scoring”)  the  thousands  of  generated 
structures  to  determine  the  likely  binding  location.  Many  use  the  Rosetta  Modeling  Suite,  which 
is  a  software  package  for  protein  structure  prediction,  protein-protein  and  small  molecule 
docking,  and  protein  design  {7, 14).  This  was  used  to  study  protein-protein  docking  using  only 
fixed  protein  displacement  and  side-chain  rotamers,  but  the  authors  highlight  possible 
improvement  by  allowing  for  a  flexible  backbone  {15, 16).  Rosetta  was  also  used  to  study 
peptides  where  a  starting  peptide-protein  crystal  structure  was  known,  and  small  Monte  Carlo 
(MC)  simulations  of  the  peptide  backbone  were  incorporated.  The  resulting  docked  structures 
exhibited  more  hydrogen  bonds  and  better  van  der  Waals  contact  {17).  Other  modes  of  Rosetta 
have  also  been  used  to  improve  docking.  For  example,  Sammond,  et  al.  use  a  combination  of  MC 
backbone  moves  and  the  design  feature  of  Rosetta  to  optimize  peptide  structure  and  sequence 
when  bound  to  a  G-protein  a  subunit  {18).  Additionally,  work  by  Raveh,  et  al.  incorporates  a 
more  robust  design  scheme  of  Rosetta  to  optimize  peptide  structure  with  multi-residue  fragment 
building  {19,  20).  Here,  the  peptide  sequence  is  fixed  and  Rosetta  is  used  to  sample  various 
groups  of  backbone  dihedral  angles — effectively  using  MC  tests  of  a,  P,  or  coil  structures  on  3 
to  4  residue-length  clusters  of  the  peptide.  They  also  incorporate  a  scoring  scheme  using  multiple 
rounds  of  rigid  body  docking,  while  actively  changing  the  weights  of  the  attractive  and  repulsive 
terms  of  the  Rosetta  scorel2  score  function.  A  validation  of  this  method  on  about  20  peptide- 
protein  crystal  structures  shows  promising  results. 

Other  simulation  methods  have  been  used  to  improve  flexibility  and  scoring/ranking  as  well.  Use 
of  molecular  dynamics  software  in  combination  with  a  docking  program  has  emerged  recently  as 
a  technique  complementary  to  all-Monte  Carlo  style  docking.  Variations  of  this  method  are 
outlined  in  recent  review  articles  {21,  22).  One  of  the  early  uses  of  molecular  dynamics  in 
docking  was  shown  in  Lin  et  al.,  where  flexibility  of  the  protein  receptor  is  captured  through 
long  molecular  dynamics  simulations  and  ligands  are  then  docked  to  the  ensemble  of  receptor 
configurations  (25).  Later  on,  a  multi-software  method  was  used  by  Okimoto  et  al.  to  combine 
molecular  dynamics  and  solvation  energy  computation  into  docking  small  ligands  {24).  The 
authors  used  GOLD  to  first  dock  the  ligand,  and  then  minimized  the  outputted  structure  using  a 
molecular  mechanics  forcefield.  The  minimized  structure  was  then  simulated  with  AMBER  8.0 
(ff03)  using  molecular  dynamics  on  add-in  computational  hardware  called  MDGRAPE-3 .  Finally, 
the  structure  was  scored  with  MM/PB-SA  to  find  the  AG  of  binding  by  computing  the 
conformational  energy  of  the  ligand,  nonbonded  van  der  Waals  and  electrostatic  interactions,  the 
solvation  free  energy,  and  the  nonpolar  solvation  free  energy.  For  a  sample  of  1000  ligands. 
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including  30%  active  ligands,  the  authors  report  improved  results  for  three  target  proteins,  with 
the  exeeption  of  CDK2. 

In  a  another  approaeh.  Antes  presented  DynaDock  in  2009,  a  joint  dynamies  and  doeking 
software  paekage  using  eustomized  interatomie  potentials  via  the  OPMD  approaeh  (25).  OPMD 
was  used  to  sample  multiple  energy  minima  around  a  erystal  strueture  eonfiguration  with 
dynamics,  and  did  not  rely  on  annealing  methods.  Antes  also  ineorporated  a  custom  scoring 
function,  combining  the  eonformational  energy  of  the  peptide,  with  nonbonded  van  der  Waals 
and  Coulombie  interaetions  with  the  protein.  When  compared  to  AutoDock,  the  method 
performed  better  for  peptides  where  the  number  of  rotatable  bonds  is  larger  than  15  (26).  Finally, 
Dagliyan  et  al.  used  several  moleeular  dynamies  methods,  without  formal  doeking  software,  to 
generate  bound  eonfigurations  of  peptide  protein  complexes  (27).  Without  knowledge  of  the 
bound  erystal  strueture,  repliea  exehange  dynamies  eoupled  with  diserete  moleeular  dynamies 
integration  is  used  to  simulate  various  peptides  at  random  points  around  the  protein  surfaee.  The 
authors  then  eompute  the  binding  energy  for  these  struetures  with  the  MedusaDock  seoring 
funetion.  Results  suggest  that  eleetrostaties  play  an  important  role  in  the  formation  of  an 
“eneounter  eomplex”  prior  to  forming  a  bound  conformation. 

In  addition  to  molecular  dynamics  and  mechanics,  use  of  different  length  and  timeseale  methods 
ean  improve  flexibility  and  seoring/ranking  as  well.  Nowosielski  et  al.  eonneeted  quantum  level 
energetics  to  molecular  dynamics  and  docking  methods  to  study  ligand  binding  in  pantothenate 
synthetase  and  suceessfully  simulated  the  open-to-elosed  transition  of  the  enzyme  for  ligand 
binding  (28).  To  aehieve  longer  timeseale  simulations,  the  steered-moleeular  dynamies  (SMD) 
method  was  used  on  various  ligand-enzyme  complexes  by  Whalen  et  al.  (29)  SMD  simulations 
captured  ligand  binding  energies  by  indueing  a  transition  from  ligand-bound  and  enzyme-elosed 
to  ligand-unbound  and  enzyme-open. 

Although  these  examples  ineorporate  dynamies  slightly  differently,  they  all  show  good 
improvement  on  statieally  docked  structures  when  applying  dynamics  in  lieu  of  Monte  Carlo 
moves.  It  is  also  important  to  note  that  only  one  of  the  previous  doeking  methods  outlined  start 
without  a  erystal  strueture,  and  many  do  not  attempt  doeking  peptides  with  unknown  binding 
loeations.  Even  so,  these  works  do  highlight  the  benefits  of  using  dynamies,  and  also  exhibit  a 
need  for  a  generalized,  usable  eode  to  eonnect  different  styles  of  simulation  software.  In  reeent 
works  by  Seeliger  et  al.  and  kill  et  al,  the  authors  eonneet  the  popular  PyMol  visualization  and 
analysis  program  to  docking  and  molecular  dynamics  software  (30-32).  With  this  software, 
researehers  can  conduct  simulation  and  analysis  of  ligand  binding  and  design  within  PyMol 
using  a  graphieal  interfaee.  Professional  3D  modeling  applieations,  sueh  as  Blender  and 
Autodesk  3DS Max*  are  unique  areas  for  eonneetion  to  moleeular  docking.  Users  ean  add  a  plug¬ 
in  from  the  Olsen  Laboratory  at  the  Scripps  Research  Institute  ealled  ePMV  to  doek  biologieal 
struetures  and  compute  energies  (55). 


Autodesk  3DS  Max  is  a  registered  trademark  of  Autodesk  Inc. 


3 


The  XPairIt  Application  Programming  Interface  (API)  offers  similar  styles  of  docking 
methodology  and  we  show  how  some  of  these  can  be  combined  to  improve  flexibility  and 
scoring/ranking  in  peptide-protein  docking.  XPairIt  works  as  a  controller  code  for  PyRosetta, 
NAMD,  PSFGen,  STRIDE,  VMD,  APRS,  and  GAMESS,  and  allows  a  user  to  move  information 
between  these  software  packages  for  custom  simulations  {1,  4,  34-37).  Additionally,  the  XPairIt 
API  and  external  software  packages  are  tailored  for  use  on  high-performance  computing  (HPC) 
systems,  which  is  critical  for  the  search  of  unknown  peptide  binding  locations  and  the 
optimization  of  peptide  sequences  for  improved  protein  binding. 

With  existing  state-of-the-art  in  mind,  the  authors  use  a  bottom-up  approach  to  create  a 
connection  between  the  various  styles  of  simulation  software  and  improve  interoperability  of  the 
software  and  their  methods.  The  XPairIt  API  applied  to  peptide  docking  joins  the  external 
molecular  dynamics  package  NAMD  and  docking  software  PyRosetta  to  enhance  peptide-protein 
binding,  which  we  refer  to  as  the  XPairIt  Docking  Protocol  {4,  34).  When  contrasted  with  other 
multi-method  docking  software,  use  of  the  XPairIt  Docking  Protocol  offers  more  detailed 
control  of  docking  simulations  and  the  ability  to  run  simulations  on  HPC  systems.  In  general,  the 
methods  in  particular  are  similar  to  the  aforementioned  work  connecting  PyMol,  docking,  and 
molecular  dynamics;  however,  there  are  several  differences  in  the  approach  taken  to  construct 
the  simulations,  the  style  in  which  we  use  molecular  dynamics,  and  the  analysis  of  the  simulation 
results.  Furthermore,  tho  XPairIt  Docking  Protocol  is  based  on  the  highly  extensible  APa/r/t 
API,  which  allows  for  the  simple  addition  and  use  of  other  custom  code  or  external  software 
packages.  In  the  following  text,  we  provide  our  approach  for  peptide  docking  with  the  XPairIt 
Docking  Protocol,  and  detail  the  use  of  external  software  to  address  the  challenges  inherent  to 
peptides:  flexibility  and  scoring.  A  previous  version  of  this  protocol  was  reported  earlier  (55). 

We  organize  this  work  by  first  outlining  the  XPairIt  API  structure  and  philosophy,  then  develop 
a  more  robust  and  improved  style  of  molecular  docking,  applied  to  peptide-protein  interactions, 
in  Section  II.  Section  III  illustrates  the  application  of  the  XPairIt  docking  protocol  to  a  common 
case  study.  Analysis  and  Discussion  of  the  protocol  are  given  in  Section  IV.  Finally,  we  provide 
conclusions  and  remarks  on  the  future  of  the  software  in  Section  V. 


2.  Methodology 


2,1  XPairIt  Application  Programming  Interface  (API)  Philosophy  and  Structure 

XPairIt  incorporates  a  detailed  and  customizable  style  of  control  and  analysis  during  a 
simulation  run,  allowing  a  user  to  put  in  place  a  method  whose  behavior  is  dependent  on 
properties  computed  in  real  time.  Its  structure  is  in  line  with  other  software  and  their  inherent 
partitioning  of  simulation  components.  Namely,  the  Etomica  Simulator  of  the  Kofke  Group  at 
the  University  at  Buffalo  and  the  LAMMPS  Molecular  Simulator  of  Sandia  National  Laboratories 
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have  both  had  significant  influence  on  the  organization  of  the  XPairIt  API  (though  all  of  the 
code  is  original)  (5,  39).  The  basic  building  blocks  of  a  nanoscale  simulation — Boxes,  Atoms,  and 
Vectors — are  represented  in  the  software  as  Python  objects.  Methods,  Integrators,  Potentials, 
and  user-defined  code  perform  operations  on  these  objects  to  move  atoms  in  space,  or  compute 
properties  of  the  system  based  on  atomic  positions  or  types.  Through  Python’s  object-oriented 
structure,  a  user  may  extend  any  one  of  these  types  to  create  a  custom  Method,  Integrator,  or 
Potential,  or  simply  use  a  combination  of  these  in  a  unique  way  as  a  single  Python  script  to 
control  a  simulation  or  analyze  its  output.  Figure  1  diagrams  the  main  objects 'm  XPairIt  and  how 
they  interact. 


Figure  XPairIt  API  Diagram  showing  simulation  components,  class  hierarchy, 

control,  and  connection  to  external  software  (XExtemal)  and  hardware  (HPC). 
Supported  External  Programs:  PyRosetta,  NAMD,  PSFGen,  STRIDE,  VMD, 
APES,  and  GAMESS  (7,  4,  34-37). 
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2,2  XPairIt  API  Classes 

While  not  an  exhaustive  list,  the  elasses  below  best  frame  the  eurrent  version  of  the  XPairIt  API. 

Atom.  This  object’s  properties  are  taken  directly  from  the  properties  of  a  real  atom;  position, 
velocity,  type,  and  radius.  An  extension  of  this  object  is  created  for  our  molecular  docking 
implementation,  called  AtomTypeBiological  This  contains  other  information,  such  as  charge, 
occupancy,  beta  parameters,  and  species  name. 

Vector.  This  is  a  data  structure  holding  a  list  of floats,  as  well  as  methods  to  perform  vector 
operations,  such  as  addition,  dot  product,  cross  product,  and  normalize.  VectorlD,  VectorSD, 
Vector 3 DRandom,  and  VectorN\\me  been  created  to  extend  the  original  Vector  class.  The  Atom 
object  creates  its  own  VectorSD  object  to  track  its  position. 

Box.  The  simulation  volume  is  defined  by  this  class.  Box  also  holds  lists  of  all  the  atoms  that  are 
“inside”  the  volume,  information  about  boundary  conditions  and  program  components  such  as 
neighbor  lists,  and  provides  methods  for  creating  atoms  or  changing  atom  positions. 

BoxBiological,  an  extension  of  Box,  is  created  to  handle  other  properties  specific  to  a  biological 
simulation,  such  as  loading  a  PDB  file. 

Simulation.  This  is  the  first  object  that  should  be  created  during  the  development  of  a  simulation 
with  XPairIt.  It  contains  structures  for  the  user  to  connect  many  high-level  simulation 
components,  such  as  Integrators  and  Methods,  and  allows  these  components  to  broadcast 
information  in  a  one-to-all  type  message,  without  knowledge  of  the  other  components.  Included 
here  is  also  the  software’s  banner  and  copyright  information  for  any  connected  external  software. 

Compute.  These  are  simple  classes  that  receive  a  list  of  Atoms  and  perform  operations  to 
calculate  specific  properties  of  this  list.  Examples  are  CenterOfMass,  Dihedral  Angle, 
SharedInterfaceAtoms,  RadialDistributionFunction,  Radius  Of  Gyration,  Secondary  Structure, 
HydrogenBonds,  MeanSquaredDisplacement  and  SurfacePoints . 

Integrator.  This  class  contains  methods  that  perform  an  operation  on  the  system,  typically  the 
Atoms.  Integrators  may  be  created  for  various  molecular  dynamics  integration  schemes,  or 
Monte  Carlo  sampling  styles.  These  custom  integrators  can  extend  the  mam  Integrator  class  to 
obtain  access  to  callback  functions  to  broadcast  commands  through  Simulation. 

Method.  Typically,  this  structure  contains  code  for  a  multi-operation  scheme  involving  a 
collection  of  Integrators.  In  the  authors’  case,  several  different  Method  classes  were  created  to 
perform  molecular  docking  simulations.  Because  of  this  hierarchy,  a  user  can  create  a  complex 
simulation  with  only  a  few  lines  of  code,  using  existing  Method  classes. 

Residue,  ResidueChain,  Molecule.  These  are  data  structures  that  map  to  typical  biological 
groupings  of  atoms.  An  amino  acid’s  atoms  are  in  a  list  within  Residue,  a  bonded  chain  of  these 
amino  acids  are  in  a  list  within  ResidueChain,  and  a  collection  of  these  bonded  chains  are  in  a 
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list  within  Molecule.  The  methods  within  these  data  struetures  allow  for  “top-down”  and 
“bottom-up”  paths  to  their  parent  or  ehild  struetures.  For  example,  if  a  user  is  working  with  a 
partieular  Atom,  there  are  methods  in  plaee  to  get  to  that  Atom ’s  parent  Residue,  ResidueChain, 
and  Molecule.  Conversely,  if  a  user  has  a  Molecule,  there  are  methods  to  ehoose  a  partieular 
ResidueChain  within  that  Molecule,  a  partieular  Residue,  and  finally  a  speeifie  Atom.  This  is 
useful  for  passing  logieal  groupings  of  atoms  to  Methods,  Integrators,  or  Computes. 

2.3  External  Software  Interface  and  Control 

X _ .  These  are  interfaees  for  the  various  external  software  programs  used  with  XPairIt  and  are 

named  using  the  “X”  eonvention.  For  example,  a  nonexhaustive  list  of  eurrent  interfaee  elasses 
in  XPairIt  is  XNAMD,  XPyRosetta,  XAPBS,  XGAMESS,  and  XVMD.  Specialized  Integrators  call 
these  interface  classes  to  allow  the  user  to  control  an  external  program. 

IntegratorX  A  specialized  form  of  the  XPairIt  Integrator  which  calls  a  similarly  named  X 
class.  The  IntegratorX  class  enables  the  translation  of  APaz>/t-style  commands  and  structure,  to 
and  from  an  external  software  structure.  An  example  is  Integrator  XNAMD,  where  general 
Integrator-style  methods  call  into  XNAMD  to  setup,  execute,  and  return  output  from  the  NAMD 
Simulator.  These  Integrators  are  similar  in  structure  to  internal  XPairIt  Integrators,  allowing  a 
user  to  easily  perform  operations  with  any  type  of  IntegratorX. 

2.4  A  Simulation  With  XPairIt  API 

Figure  2  shows  a  simple  Python  script  using  the  XPairIt  API  to  create  a  molecular  docking 
simulation,  employing  several  of  the  simulation  building  blocks  listed  in  the  previous  section. 
First,  a  Simulation  object  is  created  to  hold  a  few  of  the  working  parts  of  the  simulation.  Next, 
the  Simulation  is  used  to  create  a  Box.  Then,  a  configuration  in  the  form  of  a  PDB  file  is  read  in 
to  create  &  Molecule,  ResidueChain(s),  Residue(s),  and  the  thousands  of  Atom  and 
AtomTypeBiological  objects  in  the  simulation.  Next,  a  new  docking  method  is  created  from 
existing  code.  Subsequently,  the  Method  is  added  to  the  Simulation,  passing  a  name,  the  Method, 
and  the  Box  which  the  Method  will  operate  on.  Then  after  variables  are  created  for  the  peptide 
chain  “X”  and  number  of  docking  attempts,  the  setup  routine  in  Method  is  called.  Finally,  the 
Method  is  run  with  chosen  parameters. 
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from  Methods .MethodDockingDynamics  import  MethodDockingDynamics 
from  Simulation. SimulationXPairIt  import  SimulationXPairIt 
import  sys 

#0  create  simulation  object 

sim  =  SimulationXPairIt ( ) 

#1  create  box 

box  =  sim.createBoxBiological ( 'boxl' ) 

#2  create  molecule  from  PDB  file 

molecule  =  box.createMolecule ( 'pep-pro_dock.pdb' ) 

#3  create  pre-packaged  docking  method 

dockingMethod  =  MethodDockingDynamics ( 'dockingTest' ) 

#4  add  method  to  simulation  object 

sim.addMethod ( 'dockerl ' ,  dockingMethod,  box) 

#5  get  peptide  chain  'X' 

peptidechain  =  molecule. getResidueChainByID( 'X' ) 

#6  setup  docking  method 

dockingMethod. setup (peptidechain,  ' scorel2 ' ) 

#7a  run  method  for  global  docking 

attempts  =  50 

dockingMethod. attempt (attempts) 

#7b  run  method  for  focused  docking 

attempts  =75 

dockingMethod. attempt (attempts,  _focusDock=True) 


Figure  2.  Python  script  example  of  a  simple  XPairIt  docking  simulation. 

2,5  Improved  Docking  Methods  for  Peptide  Systems  with  XPairIt  Docking  Protocol 

Before  we  begin  our  global  docking  search,  we  first  equilibrate  the  peptide  and  protein. 
Simulation  of  the  docking  partners,  peptide  and  protein,  proceed  separately.  XPairIt  drives  an 
NPT  simulation  at  1.0  atm  and  300  K  in  NAMD,  using  the  CHARMM  miQXdXomic  potential  with 
TIP3P  water  {40,  41).  The  final  structure  of  the  protein  from  equilibration  simulations  is 
minimized  within  NAMD.  Atomic  positions  are  captured  hy  XPairIt  and  sent  to  PyRosetta,  and 
then  the  protein’s  residue  sidechain  positions  are  sampled  and  minimized  using  the  Rosetta 
repacking  scheme.  This  protein  structure  is  then  used  in  all  docking  runs.  For  peptide  equilibration, 
starting  from  a  linear  structure,  the  peptide  is  equilibrated  for  several  nanoseconds.  This  dynamics 
trajectory  is  saved  as  a  series  of  1000  snapshots,  which  are  randomly  selected  as  the  peptide’s 
starting  structure  in  subsequent  docking  runs.  This  is  also  known  as  ensemble  docking. 

The  docking  portion  of  this  protocol  is  separated  into  two  stages:  (1)  the  initial  docking  run, 
testing  peptide  binding  over  the  entire  protein  surface,  and  (2)  a  more  focused  docking  run, 
testing  probable  binding  locations  on  the  protein.  The  first  stage  is  composed  of  2000  to  5000 
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simulations,  where  the  exact  number  varies  with  protein  size.  Each  simulation  begins  by  drawing 
a  random  peptide  structure  from  the  initial  dynamics  trajectory  and  placing  it  in  a  simulation  box 
with  an  equilibrated,  minimized  protein  structure.  Within  XPairIt,  atom  positions  are  sent  to 
PyRosetta.  Controlling  PyRosetta,  the  partners  are  randomly  rotated  around  their  centers  of  mass 
and  moved  into  contact  with  one  another,  until  any  pair  of  surface  atoms  of  each  partner  is 
approximately  4.0A  apart.  This  creates  starting  structures  where  the  peptide  is  placed  at  a 
uniform  distribution  of  points  on  the  protein’s  surface. 


For  step  (i)  of  ihs  XPairIt  Docking  Protocol  stage  1,  these  simulations  (all  2000  to  5000  of  them) 
are  run  using  the  Rosetta  DockingMCM  method  and  the  scorel2  score  function,  where  the  lowest 
energy  structure  is  determined  from  sampling  small  rotation  and  translation  moves  of  the 
peptide.  Rosetta ’s  DockingMCM  method  also  samples  sidechain  rotamers  of  the  peptide  and 
protein.  After  step  (i)  is  finished,  atom  positions  are  analyzed  for  structural  and  energetic 
properties  in  a  bookkeeping  step,  and  sent  to  NAMD  for  step  (ii).  Here,  molecular  dynamics  is 
performed  on  the  peptide  and  protein  atoms  within  15.0  A  of  the  peptide,  leaving  other  atoms 
fixed.  This  simulation  is  performed  at  300  K  using  the  Generalized-Born  Implicit  Solvent 
(GBIS+SASA),  for  2.0  picoseconds  (ps),  and  then  the  structure  is  minimized  for  3000  conjugate 
gradient  steps  {42).  Finally,  for  step  (iii),  the  structure’s  sidechains  are  again  repacked  with 
Rosetta,  based  on  the  scoreI2  scoring  function,  and  this  structure  is  exported  as  a  PDB  file.  A 
typical  change  in  peptide  position  for  the  three  main  steps  of  the  XPairIt  Docking  Protocol  is 
shown  in  figure  3.  The  (i)  docking,  (ii)  dynamics  +  minimization,  (iii)  repacking  process  is 
repeated  10  times  for  additional  random  rotations  of  the  peptide.  After  the  first  stage  of  the 
docking  protocol  is  complete,  2000  to  5000  simulations  are  run  using  a  random  peptide  structure 
and  minimized  protein  structure,  and  we  produce  20,000  to  50,000  probable  bound  structures. 


Figure  3.  Example  positions  of  peptide  during  m  XPairIt  docking 

simulation.  (1)  Initial  placement,  shown 'm  green.  (2)  Docked 
peptide,  in  blue.  (3)  Minimized  structure  using  conjugate 
gradient  method  after  2  ps  molecular  dynamics  run,  in  red. 
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The  second  stage  in  the  XPairIt  Docking  Protocol  consists  of  many  focused  docking  simulations, 
which  restrict  the  peptide  to  sample  in  only  certain  areas  of  the  protein.  To  identify  these  areas, 
the  top  1000  (of  20,000  to  50,000)  docked  structures  from  stage  1  are  sorted  based  on  total 
energy,  and  then  ranked  by  their  interface  using  Rosetta ’s  scorel2  score  function.  This  is  shown 
in  equation  1.  Sorting  is  performed  first  based  on  total  energy.  This  removes  any  spurious  results 
and  restricts  our  sampling  to  more  likely  configurations.  A  further  sort  is  then  performed  by 
interface  energy  to  capture  favorable  interactions  between  the  peptide  and  protein. 

^Interface  ^Total  ^Peptide  ^Vvotein 

These  top  25  interface  energy  structures  are  then  clustered  using  XPairIt,  based  on  the  peptide 
heavy  atom  distance  from  each  protein  heavy  atom  using  a  cutoff  distance  of  4.0 A.  This 
calculation  generates  a  list  of  protein  residues  that  have  contact  with  the  peptide  for  each 
structure,  and  these  contacts  are  counted  and  totaled  for  the  top  25  structures.  For  example,  a 
particular  residue  on  the  protein  may  be  contacted  by  the  peptide  a  total  number  of  five  times 
when  all  25  structures  are  analyzed.  From  these  totals,  the  average  number  of  peptide  contacts 
per  protein  residue  is  computed,  and  those  residues  that  have  contact  amounts  larger  than  one 
standard  deviation  from  the  mean  are  recorded  as  possible  binding  locations.  Next,  the  structure 
with  the  best  interface  energy  is  identified  at  each  of  these  possible  binding  locations  and  saved 
as  a  starting  structure  for  a  stage  2  focused  dock  simulation.  An  example  of  this  clustering  is 
show  in  the  subsequent  section. 

Each  simulation  for  the  next  stage  begins  by  using  the  new  starting  structures  from  clustering, 
and  again  drawing  a  random  peptide  structure  from  the  equilibration  trajectory.  This  random 
peptide  structure  is  then  moved  to  the  center  of  mass  of  the  new  starting  structure.  1000  to  3000 
of  these  simulations  are  run  for  each  probable  binding  location  and  the  (i)  docking,  (ii)  dynamics 
+  minimization,  and  (iii)  repacking  steps  are  repeated  for  10  different  orientations  of  the  peptide. 
10,000  to  30,000  docked  structures  now  generated  for  each  possible  binding  location  are  ranked 
by  their  total  energy  using  Rosetta ’s  scorel2  function,  and  top  energies  for  each  location  are 
compared  to  determine  the  likely  binding  location.  The  final  result  in  the  XPairIt  Docking 
Protocol  is  sorted  and  determined  by  the  total  energy. 


3.  Results  and  Discussion 


We  test  the  XPairIt  Docking  Protocol  and  present  results  from  a  case  study,  docking  a  short 
peptide  to  a  small  protein.  For  this  test  we  choose  the  IRXZ  system  from  the  Protein  Data  Bank, 
which  has  been  previously  studied  with  varying  degrees  of  success,  using  two  different  docking 
schemes  {20,  27,  44).  This  particular  case  presents  added  complexity  to  the  simulation  of  peptide 
docking  in  the  form  of  solvent  and  receptor  induced  peptide  structure  and  probable  receptor 
flexibility.  The  two  partners  here  are  the  245  residue  DNA  polymerase  sliding  clamp  protein. 
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aPCNA,  and  an  1 1-mer  peptide  (KSTQATLERWF)  created  from  the  binding  motif  of  the  Flap 
EndoNuclease-1  (aFEN-l)  (45).  Equilibration  with  peptide  trajectory  sampling,  one  stage  of 
global  docking,  and  one  stage  of focused  docking  are  performed  according  to  the  previously 
outlined  XPairIt  Docking  Protocol. 

3,1  Stage  1  Global  Docking 

After  equilibration  of  the  two  partners  and  creation  of  the  peptide  trajectory,  we  ran  3000 
simulations  in  the  global  docking  stage.  The  resulting  docked  structures  are  sorted  by  total 
energy,  and  the  top  1000  are  then  ranked  by  interface  energy.  Figure  4  presents  an  overlay  of  the 
top  25  docked  structures  based  on  interface  energy.  Protein  residues  with  six  or  more  contacts, 
and  used  in  the  formulation  of  stage  2  focused  docking  starting  location(s),  are  shown  with  a 
shaded  surface. 


Figure  4.  Stage  1  global  docking  results  presented  as  overlay  of  the 
top  25  docked  structures.  Structures  are  ranked  by  their 
interface  energy  using  Rosetta’s  score!!.  Frequently 
contacted  protein  residues  are  shown  with  a  shaded  surface. 

In  figure  5,  the  top  25  structures  based  on  interface  energy  are  clustered  and  plotted  on  a  raw- 
data  histogram,  showing  specific  protein  residues  (Residues  on  A)  in  contact  with  the  peptide. 
For  the  protein  residues  with  corresponding  contact,  the  average  number  of  peptide  contacts  for 
all  25  structures  is  2.60  and  the  standard  deviation  is  2.47.  This  creates  a  contact  integer  number 
cutoff  of  6  or  greater.  From  the  data  illustrated  in  figure  5,  we  identify  protein  residues  50, 
121-126,  218-220  as  having  6  or  more  contacts  when  we  sample  the  top  25  structures. 
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Figure  5.  Stage  1  global  docking  results  presented  as  contact 
histogram.  Number  of  peptide  residues  (Frequency) 
within  4.0A  of  particular  protein  residues  (Residues  on 
A)  for  the  top  25  docked  structures.  Structures  ranked 
by  their  interface  energy  using  Rosetta’s  score! 2. 

3,2  Stage  2  Focused  Docking 

Before  beginning  stage  2  focused  docking,  we  generated  starting  eonfigurations  by  identifying 
the  best  ranked  strueture  based  on  interface  energy  at  each  of  the  locations  determined  from  the 
clustering  of  stage  1  global  docking  results.  A  list  of  the  locations  on  the  protein,  with 
corresponding  number  of  contacts  and  best  ranked  structure,  are  in  table  1 .  From  this  list,  four 
unique  docked  structures — #3,  #5,  #7,  and  #8 — ^were  identified  for  starting  structures,  shown  in 
figure  6.  1000  docking  simulations  were  then  run  for  each  docked  starting  structure,  using 
peptide  configurations  randomly  drawn  from  the  initial  equilibration  trajectory  and  moved  to  the 
docked  structure  peptide’s  center  of  mass.  For  each  simulation,  10  rounds  of  docking  were 
performed  using  random  rotations  of  the  peptide  to  start  each  dock.  Stage  2  focused  docking 
simulations  are  run  as  described  in  the  previous  section,  and  results  are  ranked  by  total  energy 
using  the  Rosetta  score  12  score  function. 
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Table  1.  Stage  1  global  docking  results  clustered  as  list  of 
protein  residues  with  six  or  more  peptide  contacts 
in  the  top  25  structures.  Structures  ranked  by 
interface  energy. 


Protein 

Residue 

No.  of  Contacts 
in  Top  25 

Best  Ranked 

Structure 

Interface  Energy 
(Rosetta  Units) 

50 

7 

#3 

-13.834 

121 

10 

#3 

— 

122 

11 

#3 

— 

123 

10 

#7 

-13.046 

124 

10 

#5 

-13.339 

125 

11 

#5 

— 

126 

7 

#5 

— 

218 

6 

#8 

-12.589 

219 

9 

#5 

— 

220 

7 

#8 

— 

-13.339 


Figure  6.  Stage  1  global  docking  top  structures  based  on 

interface  energy  for  protein  residues  with  sufficient 
contact.  Peptides  are  labeled  with  their  rank  for  the 
top  25  structures  based  on  interface  energy.  Shown 
as  a  shaded  surface  are  protein  residues 
corresponding  to  histogram  peaks  in  figure  5. 
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Results  for  the  top  ranked  struetures  for  eaeh  stage  2  focused  docking  starting  loeation  are  shown 
in  table  2.  Here,  struetures  for  eaeh  of  the  four  previously  identified  loeations  are  individually 
ranked  by  total  energy,  and  the  struetures  with  the  lowest  total  energy  for  eaeh  loeation  are 
eompared.  In  table  2,  these  top  struetures  are  listed  arranged  by  total  energy,  with  additional 
details  about  their  starting  loeation,  interfaee  energy,  and  RMSD.  Validation  of  these  results  is 
eondueted  by  eomputing  the  final  eomplex’s  peptide  RMSD  from  the  equilibrated  erystal 
strueture  of  the  doeked  eomplex.  In  eomparing  these  values,  it  is  elear  that  our  top  doeked 
strueture  is  not  within  a  reasonable  distanee  of  the  aetual  binding  loeation.  However, 
investigation  of  the  rank  2  strueture’s  peptide  RMSD  shows  very  good  agreement  with  the 
equilibrated  erystal  strueture.  Subsequent  analysis  of  the  two  remaining  struetures  shows  poor 
agreement  with  the  equilibrated  erystal  strueture.  Laeking  knowledge  of  the  RMSD, 
differentiation  of  the  top  three  struetures  based  on  total  energy  is  diffieult  when  one  aeeounts  for 
thermal  fiuetuations  of  the  peptide-protein  eomplex. 


Table  2.  Stage  2  focused  docking  results  for  each  starting  structure,  ranked  by  total 
energy. 


Rank 

Starting 

Location 

Total  Energy 
(Rosetta  Units) 

Interface  Energy 
(Rosetta  Units) 

Peptide  RMSD  from 
Crystal  Strnctnre  (A) 

1 

G.#3 

-551.133 

-10.712 

25.941 

2 

G.#8 

-549.232 

-21.541 

3.386 

3 

G.#5 

-547.761 

-13.798 

20.112 

4 

G.#7 

-541.183 

-8.739 

32.135 

In  an  experimental  eollaboration,  studying  a  system  with  an  unknown  binding  loeation,  one 
might  ehoose  to  explore  additional  simulation  methods  at  this  phase  of  the  protoeol  to  help 
differentiate  the  struetures.  Methods  sueh  as  a  long  moleeular  dynamies  run  to  insure  binding 
stability  or  a  free  energy  method  to  eompute  a  thermodynamieally  aeeurate  binding  energy  eould 
be  used.  Depending  on  instrument  throughput,  a  seareh  spaee  redueed  to  a  handful  of  struetures 
may  even  be  weleomed  by  the  experimental  team,  who  eould  now  eonduet  a  manageable  number 
of  single  or  double  mutation  studies  to  eheek  binding  loeation.  In  our  ease  we  eontinue  with 
further  analysis  and  look  also  at  the  interfaee  energy  for  eaeh  of  the  top  struetures.  Of  the  top 
three  struetures,  rank  2  possesses  nearly  double  the  interfaee  energy  of  the  other  two  struetures 
with  similar  total  energy.  The  strueture’s  low  RMSD  when  eompared  to  the  erystal  strueture  and 
low-interfaee  energy  make  it  an  ideal  eandidate  for  a  probable  binding  loeation.  We  eontinue 
with  analysis  of  this  strueture  and  eomment  on  different  aspeets  of  the  XPairIt  Docking  Protocol. 

3,3  Analysis  of  Rank  2  Structure 

The  rank  2  strueture  from  stage  2  focused  docking  is  listed  in  table  2  and  its  starting  strueture 
(G.#8)  is  shown  in  figure  7.  The  rank  2  result  shows  good  agreement  with  the  equilibrated  erystal 
strueture,  and  peptide  RMSDs  for  struetures  aligned  by  protein  alpha-earbon  atoms  are  7.595A 
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for  G.#8  and  3.386A  for  rank  2.  For  RMSD  calculation,  although  both  structures  were  aligned  to 
the  equilibrated  crystal  structure  using  only  protein  alpha-earbon  positions,  there  are  several 
highly  flexible  parts  of  the  protein  within  the  peptide  binding  region.  It  is  likely  that  for  the  rank 
2  strueture  the  peptide  may  be  bound  in  a  very  similar  eonfiguration  to  the  equilibrated  crystal 
structure,  but  due  to  the  flexible  regions,  RMSD  values  may  be  affeeted  by  only  partial 
alignment  of  the  protein  during  the  value’s  computation.  RMSD  for  the  protein  alpha-earbon 
eomponents  in  the  focused  strueture  vs.  the  erystal  strueture  is  1.7 15  A. 


Figure  7.  Global  dock  #8  structure,  one  of  four  starting  locations  for 
focused  docking,  with  most  frequently  contacted  protein 
residues  highlighted. 

Analysis  of  the  CHARMM  interfaee  energies  with  GBIS  w/SASA  used  in  the  moleeular  dynamies 
and  minimization  step  (ii)  of  both  docking  rounds  provides  further  deseription  of  the  peptide- 
protein  interaetion.  Shown  in  table  3,  when  applied  to  the  final  structures  of  global  and  focused 
rounds  of  doeking,  interface  energies  favor  strueture  #8  {global)  and  #2  {focused,  from  G.#8), 
correctly  identifying  the  doeked  struetures  which  best  resemble  the  IRXZ  erystal  strueture. 
Additionally,  the  effeets  of  solvent  and  pair-wise  eleetrostatic  interaetions  are  evident,  but  do  not 
suggest  any  aid  in  differentiating  structures.  When  eomparing  the  overall  interface  energy  from 
CHARMM+GBIS  w/SASA  and  the  CHARMM  van  der  Waals  (vdW)  component,  the  differences 
between  these  two  properties  for  focused  docking  #1  (G.#3)  and  focused  docking  #2  (G.#8)  are 
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similar,  meaning  the  remaining  energetie  eontributions  from  eleetrostaties  and  solvent  are 
similar.  Consequently,  eomputation  of  the  CHARMM  interfaee  energy  with  GBIS  w/  SASA 
eorreetly  identifies  the  preferred  bound  strueture,  indieating  that  pair-wise  eleetrostaties  and 
solvent  driving  forees  play  some  part  in  the  interfaee  energy  of  the  doeked  struetures.  However, 
sinee  the  eontribution  from  eleetrostaties  and  solvent  in  aggregate  are  positive,  and  also  similar 
for  both  focused  docking  results,  it  is  likely  they  do  not  play  an  aetive  role  in  forming  these 
partieular  doeked  struetures.  Our  results  show  that  the  CHARMM  vdW  eomponent  is  the  deeiding 
faetor  in  strueture  differentiation  for  IRXZ. 

Table  3.  Final  interface  energy  of  three  bound  structures  in  scorel2,  CHARMM  W\i\\ 

Generalized-Bom  implicit  solvent  (GBIS),  and  CHARMM’s  vdW component. 


Structure 

Scorell 
(Rosetta  Uuits) 

CHARMM-rGBIS 

(kcal/mol) 

CHARMM  vdW 
Compoueut 
(kcaFmol) 

Global  Dock  #3 

-13.834 

-19.736 

-37.183 

Global  Dock  #8 

-12.589 

-23.315 

-40.335 

Focused  Dock  #1  (from  G.#3) 

-10.712 

-13.835 

-31.728 

Focused  Dock  #2  (from  G.#8) 

-21.541 

-38.780 

-56.318 

Crystal  Structure 

-22.543 

-52.246 

-70.095 

3,4  Effects  of  Molecular  Dynamics  Within  the  XPairIt  Protocol 

Finally,  we  have  an  opportunity  to  onee  again  analyze  the  effeet  of  moleeular  dynamies  on 
peptide  doeking  results  (38).  In  table  4  we  list  the  Rosetta  scorel2  interfaee  energies  at  various 
steps  along  both  doeking  stages.  There  is  a  marked  deerease  in  interfaee  energy  as  we  employ 
NAMD  moleeular  dynamies  and  minimization,  as  well  as  a  final  repaeking  by  Rosetta.  We 
reiterate  that  during  the  step  (i)  Rosetta  Dock,  the  peptide  is  translated  and  rotated,  and  peptide 
and  protein  interfaee  sideehains  are  repaeked,  similar  to  step  (iii).  Baekbone  angles  and  bond 
lengths  remain  fixed.  With  the  addition  of  moleeular  dynamies  and  minimization  in  step  (ii),  the 
peptide  and  protein  atoms  are  allowed  to  relax  and  move  with  the  effeets  of  temperature  and 
interatomie  forees,  ereating  greater  eontaet  between  the  peptide  and  protein  surfaee,  and 
deereasing  the  interfaee  energy.  We  attribute  the  seeond  deerease  in  interfaee  energy  in  step  (iii) 
by  Rosetta  sidechain  repacking  to  a  diserepaney  between  the  CHARMM  and  scorell  energy 
funetions.  After  step  (ii)  in  the  XPairIt  Docking  Protocol  simulations,  atom  positions  do  not 
ehange  and  are  sent  from  NAMD,  baek  to  XPairIt,  and  then  direetly  to  PyRosetta  for  final 
sideehain  repaeking.  Additionally,  a  eloser  look  at  the  effeets  of  dynamies  on  the  doeked 
strueture  is  illustrated  in  figure  8,  where  we  eompare  the  per  residue  interfaee  energy  of  the 
peptide  after  step  (i)  and  after  step  (iii)  of  stage  1  global  docking.  The  LEFT  plot  shows  minimal 
eontaet  of  the  peptide  after  step  (i)  Rosetta  Dock,  when  eompared  to  step  (iii)  Rosetta  sidechain 
repacking  on  the  RIGHT  plot. 
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Table  4.  Interface  energies  for  docked  structures  at  each  step  of  ihe  XPairIt  Docking  Protocol. 
Energies  in  Rosetta  units. 


Structure 

(i)  Rosetta  Dock 

(ii)  NAMD  MD 
+  Mioimizatiou 

(iii)  Rosetta  Repack 
Sidechaius 

Global  Dock  #8 

-1.525 

-8.843 

-12.589 

Focused  Dock  #1 

-10.241 

-16.978 

-21.541 

Equilibrated  Crystal  Structure 

— 

— 

-22.543 

O.S 


Residue  on  Chain  X 


Figure  8.  Rosetta  scorell  interface  energy  per  peptide  residue  (Chain  X)  for  stage  1  global  docking 
#8  structure:  LEFT  step  (i)  Rosetta  Dock  and  RIGHT  step  (iii)  Rosetta  Repack 
Sidechains,  after  NAMD  MD  +  Minimization.  Peptide  sequence:  KSTQATLERWF.  Note 
difference  in  ordinate  scales. 


4.  Summary  and  Conclusions 


The  XPairIt  Docking  Protocol  is  demonstrated  here  to  be  a  suitable  toolkit  for  the  flexible 
doeking  of  peptides  to  protein  receptors  with  unknown  binding  locations.  Final  docked  structures 
from  global  docking  simulations  of  the  IRXZ  system  using  the  XPairIt  Docking  Protocol  show 
good  agreement  with  the  PDB  crystal  structure.  Previous  docking  studies  of  this  system  by 
Raveh  et  al.  and  Dagliyan  et  al.  test  the  limits  of  their  respective  docking  protocols  (20-27). 
Crystal  structure  refinement  of  the  IRXZ  system  by  Raveh  et  al.  using  their  FlexPepDock 
protocol  predicted  a  final  structure  of  the  peptide  in  helical  form,  which  deviates  from  the 
coiled/linear  backbone  of  the  peptide  in  its  bound  configuration.  As  reviewed  in  Section  1,  using 
FlexPepDock,  peptide  fragments  are  sampled  to  determine  optimal  structure,  and  show  very 
good  results  for  all  systems,  except  for  IRXZ  and  one  other  reported.  We  reserve  any  further 
comparison  to  our  protocol,  as  FlexPepDock  is  not  used  for  global  peptide  docking — this  method 
may  be  best  used  with  an  unknown  peptide  structure,  and  a  known  protein  binding  pocket. 
Peptide  docking  of  the  IRXZ  structures  with  discrete  molecular  dynamics  and  MedusaDock  in 
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Dagliyan  et  al.  shows  good  agreement  with  the  erystal  strueture,  reeovering  an  RMSD  on  the 
same  order  of  magnitude  as  those  reported  here.  However,  Dagliyan  et  al.  suggest  that  eapturing 
the  binding  indueed  protein  struetural  ehange  remains  a  major  ehallenge.  They  note  that  the 
inelusion  of  baekbone  flexibility  in  the  flexible  reeeptor  simulations  signifieantly  inereases  the 
eomputational  time  and  prefer  a  fixed  baekbone  with  flexible  sideehains,  where  their  diserete 
dynamies  regularly  run  30-40  ns. 

While  the  XPairIt  Docking  Protocol  dynamies  runs  are  on  the  pieoseeond  timeseale  and 
therefore  eannot  guarantee  large  seale  protein  baekbone  motion,  results  from  IRXZ  illustrate 
some  improvement  in  this  ease  of  reeeptor  flexibility.  When  eompared  to  the  erystal  strueture, 
the  protein-only  alpha-earbon  RMSD  for  our  top  doek  is  1.715A — a  sizable  differenee,  as  only 
protein  atoms  within  15A  of  the  peptide  move  during  our  simulation  and  RMSD  is  averaged  over 
all  protein  alpha-earbons.  Additionally,  adding  a  focused  round  of  doeking  using  moleeular 
dynamies  allows  for  signifieant  improvement  of  protein  peptide  eontaet  and  an  improvement  of 
RMSD,  shown  previously  in  figure  8  and  seetion  3.4.  These  results  indieate  an  important  ehange 
in  both  peptide  and  protein  baekbone  strueture,  and  are  aehieved  with  a  small  amount  of 
dynamics  simulation  time  by  combining  the  Rosetta  and  NAMD  in  a  coordinated 
implementation. 

Overcoming  the  current  challenges  in  global  peptide  docking,  such  as  the  representation  of 
flexibility  and  accurate  energetics,  require  simulation  of  peptide-protein  systems  using  multiple 
methods,  and  accordingly,  multiple  software  packages.  The  extendable  XPairIt  API  provides  a 
unifying  code  structure,  with  the  ability  to  successfully  implement  these  software  packages  and 
their  methods  on-the-fly,  as  customizable  docking  simulations  within  the  XPairIt  Docking 
Protocol. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


aPCNA 

Proliferating  Cell  Nuclear  Antigen 

aFEN-1 

Flap  EndoNuclease-1 

API 

application  programming  interface 

CDK2 

Cyclin-dependent  Kinase-2 

CHARMM 

Chemistry  at  Harvard  Molecular  Mechanics 

DNA 

Deoxyribonucleic  acid 

GOLD 

Genetic  Optimization  for  Ligand  Docking 

GPU 

graphics  processing  unit 

MC 

Monte  Carlo 

MCM 

Monte  Carlo  Mover 

MM/PB-SA 

Molecular  Mechanics  /  Poisson-Boltzmann  Surface  Area 

MMTK 

Molecular  Modeling  Toolkit 

MOL 

Molecular  Operating  Environment 

OPMD 

optimized  potential  molecular  dynamics 

SMD 

steered-molecular  dynamics 

ePMV 

embedded  Python  Molecular  Viewer 

HPC 

high-performance  computing 

NAMD 

Nanoscale  Molecular  Dynamics 

GBIS 

Generalized-Bom  Implicit  Solvent 

SASA 

solvent  accessible  surface  area 

PDB 

protein  data  bank 

ps 

picoseconds 

RMSD 

root  mean  squared  displacement 
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vdW 


van  der  Waals 


VMD 


Visual  Molecular  Dynamics 
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